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ABSTRACT 

Recommender systems have emerged as a new weapon to 
help onhne firms to reahze many of their strategic goals (e.g., 
to improve sales, revenue, customer experience etc.). How- 
ever, many existing techniques commonly approach these 
goals by seeking to recover preference (e.g., estimating rat- 
ings) in a matrix completion framework. This paper aims to 
bridge this significant gap between the clearly-defined strate- 
gic objectives and the not-so-well-justified proxy. 

We show it is advantageous to think of a recommender 
system as an analogy to a monopoly economic market with 
the system as the sole seller, users as the buyers and items as 
the goods. This new perspective motivates a game-theoretic 
formulation for recommendation that enables us to identify 
the optimal recommendation policy by explicit optimizing 
certain strategic goals. In this paper, we revisit and ex- 
tend our prior work, the Collaborative-Competitive Filtering 
preference model [3^, towards a game-theoretic framework. 
The proposed framework consists of two components. First, 
a conditional preference model that characterizes how a user 
would respond to a recommendation action; Second, know- 
ing in advance how the user would respond, how a recom- 
mender system should act (i.e., recommend) strategically 
to maximize its goals. We show how objectives such as 
click-through rate, sales revenue and consumption diversity 
can be optimized explicitly in this framework. Experiments 
are conducted on a commercial recommender system and 
demonstrate promising results. 

Categories and Subject Descriptors 

H.5.3 [Information Systems]; Web-based Interaction; 

H. 3.3 [Information Search and Retrieval]: Information fil- 
tering 

General Terms 

Algorithms, Performance 

Keywords: Recommendation optimization. Collaborative 
games, Econometric model. Expected utility theory 

I. INTRODUCTION 

Recommender systems have become a core component for 
today's online businesses. With the abilities of connecting 
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merchant supply (i.e., items of various types such as retailing 
products, movies, articles, ads, experts, etc.) to market de- 
mands (i.e., potentially interested consumers), recommender 
systems are helping online firms (e.g. Amazon, Netflix, Ya- 
hoo!) to realize many of their hard-to-attain business goals 
(e.g., to boost sales, improve revenue, enhance customer ex- 
periences) [5] |6l [9l |30]. Compared to an offline market, 
online recommender system has the unbeatable convenience 
in control, intervention, monitoring and measurement of the 
market, and consequently the appealing opportunity to ad- 
just its operational actions (i.e., recommendation policy) to 
optimize certain strategic objectives. Surprisingly, despite 
the fact that many of these goals are clearly defined, they 
are not optimized in today's recommender systems in a well- 
justified way. Instead, research on recommendation has been 
focused almost exclusively on learning preference (e.g., esti- 
mating a user's rating to a movie) in a matrix completion 
formulation [271 1221 1151 E] [31]. It is rather unclear how 
preference learning, as a proxy, approximates these goals, or 
how a strategic intervention should be designed to achieve 
certain goals. 

In this paper, we seek to bridge this significant gap. We 
show it is advantageous to look at the user-system interac- 
tions and think of a recommender system as an analogy to 
a monopoly economic market (i.e., system as the sole seller, 
users as buyers and items as goods Jj, rather than user-item 
interactions as in the conventional matrix completion for- 
mulation. This new perspective motivates a novel game- 
theoretic formulation, upon which recommendation policy 
can be optimized strategically with respect to business ob- 
jectives such as click-through rate, sales revenue and con- 
sumption diversity. 

1.1 User-System Interactions 

Recommender systems are commonly designed by analyz- 
ing the dyadic user-item interactions as can be recorded by 
a matrix, for example, users assigning ratings to movies. 
Research has thus been focused exclusively on estimating 
preference or equivalently completing the matrix of "who- 
like-what" ll[27l[22l|3Tl[ll|Tl[l5l|8]. This matrix- completion 
formulation of recommendation has been extensively inves- 
tigated and become especially popular thanks to the Netfiix 
Prize Competition. Nonetheless, as we show in this paper, 
the formulation of recommendation as user-item interaction 
or matrix completion is inherently fiawed — recommenda- 
tion is not solely about what you know (i.e., knowledge about 

^Hereafter, we will use interchangeably "system" and 
"seller", "user" and "buyer", "item" and "good". 



the user), but more importantly about how you act (i.e., how 
to recommend items to serve the user or persuade the user 
to consume). Instead, it is advantageous to think of rec- 
ommendation as an interaction between the system and the 
users and formulate it as an interdependent decision-making 
process (aka games) [16j . 

In a typical interaction, the system acts by providing a set 
of personaUzed recommendations, and user reacts by mak- 
ing choices, i.e., by choosing to consume some of the recom- 
mended items (e.g., click a link, rent a movie, view a News 
article, purchase a product). This process in many aspects 
resembles what happens in a monopoly market where the 
recommender system, as the sole seller, has absolute market 
power to manipulate the market, yet the utility it receives 
depends on the reaction of the buyers (i.e., users), e.g., the 
success of an advertising system is directly related to how 
users react (i.e., whether they click the ads or not). Clearly, 
the action of the seller and the reaction of the buyer are in- 
terdependent - the two players (i.e., seller and buyer) each 
has its own objective (i.e., utility) to achieve, yet how and 
to what extend they can achieve their own objectives de- 
pends also on the decision of the other player. Conventional 
matrix completion formulation for recommendation is inher- 
ently flawed as it is incapable to capture such interdependent 
decision making interactions. As a results, although many 
business objectives in e-commerce are clearly defined, how a 
recommender can be designed to optimize these goals hasn't 
yet been explored. 

1.2 Recommendation as Collaborative Games 

In this paper, we present a game-theoretic formulation 
for recommendation, where the user-system interactions are 
modeled as a collection of coupled games with each game 
played between the seller and one buyer (i.e., between the 
system and one user). For the sake of statistical inference, 
it is nonetheless important not to model these games as mu- 
tually independent. We therefore bring forward the notion 
of "collaborative games" that similar games are expected to 
yield similar outcomes, which enables us to pool the sparse 
data across games to obtain reliable statistical estimation. 

We extend our prior work on "Collaborative-Competitive 
Filtering" (CCF) preference model [32] towards a game-theoretic 
framework. The framework consists of two components: (1) 
a conditional model pu{R\A) that characterizes the reaction 
i? of a buyer u in the context of any given action A of the 
seller; this model enables us to predict in advance what the 
outcome of a game would be (e.g., how the buyer would 
respond); and (2) given pu{R\A) for every buyer u, a for- 
mulation for optimizing the seller's action policy A w.r.t. a 
predefined payoff (e.g., a strategic goal). 

To effectively model pu{R\A), we revisit and extend the 
CCF preference model [32] that integrates latent factor mod- 
els in collaborative filtering with choice models in economet- 
rics. By using latent factor based utility parametrization, 
the model encodes the "collaboration effects" among games 
[3 [23] to advocate the notion of "collaborative games". As 
the policy spaces are prohibitively large yet the observations 
are extremely sparse, this formulation is essential for reli- 
able statistical inference because it enables the sparse data 
to be pooled across games. It also remarkably reduces the 
parametric complexity of puiR\A) significantly from a pro- 
hibitive high-order polynomial scale down to a linear scale. 

The knowledge about users' reaction behavior, as charac- 



terized hy puiR\A), enables us to predict "future" (i.e., user's 
reaction) with uncertainty and further to optimize the ac- 
tion (i.e., the recommendation policy) of the recommender 
system strategically [16]. For any input action A, the possi- 
ble outcomes of the games occur with probabilities defined 
upon pu{R\A). Given a payoff (i.e., a function of the out- 
come) that is von Neumann-Morgenstern rational, the ex- 
pected utility theory asserts that the best action is the one 
that maximizes the expected payoff [25] • We show how busi- 
ness objectives such as click-trough rate, sales revenue and 
consumption diversity can be formulated explicitly as ex- 
pected utilities and used in turn to optimize a recommender 
system's action policy. 

We also show that the CCF model is sequentially ratio- 
nal and thus approximates the perfect Nash equilibrium [TS] . 
Experiments on a real- world commercial system demonstrate 
that the proposed CCF model not only outperforms CF 
models in both offline and online tests but is also highly 
effective in achieving satisfactory strategic goals. 

Outline: The rest of the paper is structured as follows. 
We first briefiy review the current matrix completion for- 
mulation and collaborative filtering in Section [2] We then 
present our new game-theoretic formulation in Section[3]and 
the CCF model in Section [4] Experiments are presented in 
Section[S] followed by summary and conclusion in Section[Bl 

2. USER-ITEM INTERACTIONS AND 
COLLABORATIVE FILTERING 

Many existing approaches generally think of recommen- 
dation as user-item interactions and therefore aim to re- 
cover/estimate the preference of each individual user to the 
items. Given a set of A'^ users 

ueU ■-{1,2,...,N} 

and a set of M items 

iGX:={l,2,...,M}, 

this is naturally formulated as a matrix completion prob- 
lem, where we are given observations of dyadic responses 
{(m, i,j/ui)} with each yui being an observed response in- 
dicating user's preference (e.g. user's rating to an item, or 
indication of whether user u likes item i), the goal is to 
complete the whole mapping: 

(u, i) yui where u £ U,i £ I 

which constitutes a large matrix Y G y\^\^ 1^1 . Assume each 
item can be consumed multiple times, recommendations are 
usually done by a simple preference-based ranking according 
to Y, (i.e., recommending the items with highest yui scores 
to user u). This formulation include both of the two major 
categories of approaches to recommendation, i.e., content- 
based filtering [t] [8] and collaborative filtering [27l [22l [Tl 
115) . among which we briefiy review the latter. 

It is worth noting that the observed responses are often 
extremely sparse in realistic systems, i.e., while we might 
have millions of users and items, only a tiny proportion (con- 
siderably less than 1%) of the entries of the matrix Y are 
observable. This "data sparseness" issue has been widely 
recognized as one of the key challenges of recommender sys- 
tem [221 m nS]. To this end, collaborative filtering (CF) ex- 
plores the notion of "collaboration effects", i.e., similar users 



have similar preferences to similar items. By encoding col- 
laboration, CF pools the sparse observations in such a way 
that for predicting y{u, i) it also borrows observations from 
other (similar) users/items. Generally speaking, existing CF 
methods fall into either of the following two categories. 

Neighborhood models. A popular class of approaches to 
CF is based on propagating the observations of responses 
among items or users that are considered as neighbors. The 
model first defines a similarity measure between items / 
users. Then, an unseen response between user u and item i 
is approximated based on the responses of neighboring users 
or items |27ll22j . for example, by simply averaging the neigh- 
boring responses with similarities as weights. 

Latent factor models. This class of methods learns pre- 
dictive latent factors to estimate the missing dyadic responses. 
The basic idea is to associate latent factory, (f>u G for 
each user it and ipi G for each item i, and assume a 
multiplicative model for the dyadic response, 

Piyui\u, i) = p{yui\cl>Zi>i; O), 

where Q denotes the set of hyper-parameters. This way the 
factors could explain past responses and in turn make pre- 
diction for future ones. This model implicitly encodes the 
Aldous-Hoover theorem J3' for exchangeable matrices - yui 
are independent of each other given (jtu and i/j^. In essence, it 
amounts to a low-rank approximation of the matrix Y that 
naturally embeds both users and items into a vector space 
in which the inner-products directly reflect the semantic re- 
latedness. 

To design a concrete model (2] [I] 1151 1241 128) , one needs to 
specify a distribution for the dependence. Afterwards, the 
model boils down to an optimization problem. For example 
two commonly-used formulations are: 

- £2 regression The most popular learning formulation is 

to minimize the £2 loss within an empirical risk mini- 
mization framework [15j : 

min - '/'IV'O^ + V ||0«||^ + Ai V 

(u,i)eo ueu iex 

where O denotes the set of (m, i) dyads for which the 
responses yui are observed, Xu and Xi are regulariza- 
tion weights. 

- Logistic Another popular formulation [241 [l] is to use 

logistic regression by optimizing the cross-entropy: 

min log [l + exp(-0^^/>o] +Aw^ ||0„||2-fAx^||7/',| 

3. USER-SYSTEM INTERACTION AS 
COLLABORATIVE GAMES 

Based on the perspective of nser-item interactions, the 
matrix completion formulation for recommendation has led 
to numerous algorithms which excel at a number of data sets, 
including the prize-winning work of [15] and many other 
successful collaborative filtering algorithms [271 1221 1261 [TJ 
1151 1311 I17j . However, as we discussed, this formulation is 
inherently flawed; instead, it is advantageous to model the 
user-system interactions so as to capture the interdependent 

^We assume each latent factor (f> contains a constant compo- 
nent so as to absorb user/item-specific offset into the latent 
factor (f) and ip. 
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Table 1: An example trace of user-system interac- 
tions in recommendation. 



decision-making process between the system and the users. 
This motivates a novel game-theoretic formulation for rec- 
ommendation and opens up a promising direction that en- 
able us to optimize recommendation policy strategically in 
respect of important business objectives, which cannot be 
achieved otherwise with the conventional matrix completion 
formulation. 

Consider a typical scenario of user-system interaction in a 
recommender system: we have users u £U := {1, 2, . . . , N} 
and M items i £ X := {1, 2, . . . , M}; when a user it visits the 
site, the system recommends a set of items j4 = {ii , . . . , ij} 
and u in turn chooses a (possibly empty) subset R (- A for 
consumption (e.g. buys some of the recommended products). 
From now on, we refer to A as action, and R as reaction. 
For simplicity, we assume each action is fixed-size with a 
given length, \A\ — I, and that each reaction is either empty 
or contains exactly one choice, |_R| = 1 or 0. Therefore, we 
have AG A = X'' and 7^e7^ci = XU {0}. Table 1 shows 
an example trace of such interactions. 

The behavior of the recommender system and that of the 
users are interdependent. On the one hand, since people 
make different decisions when facing different contexts, a 
user's decision R depends crucially on the action of the sys- 
tem, A, (i.e., what was provided to him). For instance, an 
item i would not have been chosen by u if it were not pre- 
sented to him at the first place; likewise, user u could choose 
another item if the context A changes such that a better item 
were recommended to him. On the other hand, how a rec- 
ommender system acts also depend on user's behavior (i.e., 
response), because the success of recommendation (i.e., in 
terms of click-through, revenue, etc.) is defined directly on 
how users react to it (e.g., purchase a product, click an ad, 
rent a movie). It is therefore nature to formulate recom- 
mendation based on game theory, as analogy to a monopoly 
market where the recommender as the sole seller, a user as 
a buyer and the items as the goods. 

Formally, the user-system interactions in a recommender 
system can be formulated as a set of A'' non-cooperative 
games Q = {G„ = {P„,Zn, U„), n = 1, 2, . . . , N}. For each 
game G„, the player set P„ = {S, Un} consists of two players, 
i.e., the system (i.e., seller) 5* and a user (i.e., buyer) m„; 
the policy space Zn — AxTZcX'xIis the set of all 
possible action-reaction pairs Zn — (A„,i?„), where Z is 
called an outcome and Z the outcome space; and the utility 
(i.e., payoff) function [/„ — {Us{Zn),Uu{Zn)} consists of 
the system's payoff 17s and the user's payoff Uu- At an 
interaction t, a user Ut visits the system and the game Gut 
is played with outcome Zt — {At,Rt) and utility output 
U{At, Rt)- Since the users' behavior is not in our control, 
our goal in designing a recommender system is to generate 
a system action (recommendations) Af for an incoming visit 



t of user ui so as to maximize the system's payoff Us{Zi). 

It is important to emphasize that the games in Q should 
not be modeled as independent games. Particularly, since 
the outcome space can be very large, yet observations are 
typically sparse, it is practically important to still be able 
to leverage the collaboration effect such that similar games 
are expected to yield similar outcomes. This way it enables 
us to pool the sparse evidences across different but similar 
games and in turn obtain reliable statistical inference. For 
this reason, we term the formulation ^'collaborative games" 
with a slight abuse of terminology. 

This game-theoretic formulation provides a novel perspec- 
tive for recommendation. Particularly, since the strategies 
of the buyer and the seller are interdependent, to optimize 
the seller's action, we have to (1) for each candidate action 
A, predict the buyer's reaction R in advance; and then (2) 
find the best action A by maximizing the achievable payoff 
Us{A,R). 

4. COLLABORATIVE COMPETITIVE 
FILTERING 

Our recent work [32] established the first principled model 
for learning preference from user-system interactions in rec- 
ommendation system. Unlike conventional preference learn- 
ing models which are trained on the who-like-what matrix, 
our CCF preference model is trained on user-system interac- 
tions where the system action A is used as a context in which 
a user's reaction (e.g., "like") R is made; in other word, CCF 
model doesn't only capture who-like-what, but it also con- 
siders what are the options available to the user when the 
"like" decision is made. As demonstrated in our experiments 
[32j and many other successful applications (e.g., online test 
on Yahoo! and Netfiix), the CCF preference model signifi- 
cantly improves recommendation performance on a variety 
of data sets. However, like many existing recommendation 
algorithms, our prior CCF model is still within the conven- 
tional matrix completion framework. To be precise, all these 
models only care about, and are only capable to model, the 
behavior of the user (i.e., what a user likes). These tech- 
niques are lacking as they largely ignore the interdependent 
or game-theoretic nature of the user-system interactions in 
recommendation, and consequently, none of them is able to 
optimize the recommendation policy explicitly in respect of 
a prescribed objective (although many strategic objectives 
for a recommender system are clearly defined). 

In this paper, we extend our prior work and present a 
game-theoretic framework for recommendation. We would 
like to keep the name "Collaborative-Competitive Filtering" 
or CCF since the preference model we established in our 
prior work is revised and used as one essential component of 
this framework. The CCF framework consists of two compo- 
nents: (1) a model Pu{R\A) that predicts in advance (with 
uncertainty) a buyer's reaction _R to a given action A; and 
(2) a formulation for finding the best action strategy (i.e., 
recommendation policy) for the seller. 

4.1 Conditional User Reaction Modeling 

The first part of the framework is to predict a buyer's 
choice R in the context of any given action A of the seller's. 
In a decision environment with imperfect information, this 
means to quantify the conditional distribution pu {R\A) . The 
full parametrized version of this distribution requires 0{NAl'''^^) 



free parameters, statistical estimation of which is practically 
prohibitive since the observations are typically available only 
at a scale far less than 0{NM) (e.g., in matrix comple- 
tion, usually less than 1% entries are observed) . In this sec- 
tion, we revisit and extend our CCF preference model [32] 
by presenting a conditional reaction model with complexity 
0{N + M). 

4.1.1 Behavioral Axioms of Choice Process 

We first present an axiomatic view of the choice process. 
We assume a good (i.e., item) i has a potential utility rui to 
a buyer u. Moreover, we assume a buyer u is a rational de- 
cision maker: he knows that his choice of a good i will be at 
the expense of other available alternatives i' £ A, therefore 
he compares among all the alternatives before making his 
choice. In other words, for each decision, u considers both 
revenue and opportunity cost, and decides which good to buy 
based on the potential profit of each good in A. Specifically, 
the opportunity cost c-ui is the potential loss of u from buy- 
ing a good i that excludes him to buy other alternatives: 
Cui = max{r^,j/ : i' G A\i}\ the profit -Kui = rui — Cui is 
the net gain of an decision. Based on the rational decision 
theory [l8], we have the following axiom about the buyer's 
choice reaction. 

Axiom 1 [Local optimality of choice]: A rational deci- 
sion is a decision maximizing the profit: i* = arg max^g a ttui . 

This axiom implies a local competitive effect: the buyer 
u turns to chooses the good that is locally the best in the 
context of the available alternatives in At. Unfortunately, 
the axiom restricts the utility function only up to an arbi- 
trary order-preserving transformation (e.g. a monotonically 
increasing function), and hence cannot yield a unique solu- 
tion [19) . Another issue is that it is deterministic, less useful 
since we don't have perfect information about how users re- 
act. To this end, we draw an stochastic counterpart of this 
axiom from the random utility theory [181 121[ : 

Axiom 2 [Independence of Irrelevant Alternatives] : 
For any given context set A, the relative odds of a user u 's 
selecting an item i £ A over another item j £ A should 
be independent of the presence or absence of any irrelevant 
items, i.e., 

Pu{i\{hj}) ^ Puii\A) 
Pu{j\{i,j}) Pu{j\A) 

Note that this axiom brings the parametric complexity of 
Pu{R\A) significantly down from 0{NM'+^) to 0{NNP). 

4.1.2 User Utility Parametrization 

In the spirit of the random utility theory [181 I21| . we 
decompose the buyer's utility function into two parts, i.e., 
Uu{i) = rui + Gui, where: (1) rui is a deterministic compo- 
nent characterizing the intrinsic interest of the buyer u to 
the good i; (2) the second part Cui is a stochastic unobserved 
error term refiecting the uncertainty, richness and complex- 
ity of the choice process. Under very mild conditions, it 
has been shown that the error terms e„i are independently 
and identically distributed with the WeibuU (extreme point) 
distribution [11| : 

P(e„, e) = e-="\ (2) 

Furthermore, to encode the collaborative efi'ect such that the 
observed evidences could be pooled across similar games, we 



parametrize the deterministic utilities, Vui, with the multi- 
phcative latent factor model [151 [T]: 



(3) 



4'u i^i 



where £ R and i/i^ £ R are low-rank latent profiles for 
user u and item i respectively, just as in the collaborative 
filtering models we described in Section [2] 

4.1.3 The Multinomial Logit Factor Model 

The behavioral axiom and the low-rank parametrization 
together lead to the following theorem. 

Theorem 1: Suppose the utility function Uuif) = Tui + 
tui, where e are i.i.d. Weibull variables, then the distribu- 
tion of selecting one item that satisfies Axiom 2 is given by 

gl-^j i £ A. 

a 



Proof c.f. [H] . 



The above model is well-known as the multinomial logit 
model, which has been extensively used for modeling con- 
ventional offline consumer choice behavior (e.g., choose of 
occupation, brand, housing) in econometrics |21l 119] . socio- 
metrics [18] and marketing science |10l I12| . We adapt it for 
modeling online game-theoretic interactions in recommender 
systems. In contrast to the traditional choice models, where 
the deterministic part of the utility r„i is a linear mapping 
w^Xui of observed features Xui (i.e., measured user and item 
features), here we employ the multiplicative latent factor 
parametrization. The formulation proposed hereby seam- 
lessly integrate two distinct methodologies — choice models 
in econometrics and factorization models in collaborative fil- 
tering. This integration is significant because it enables us 
to model the seller-buyer games collaboratively, rather than 
independently as in conventional choice models. That is, it 
enables us to pool data across games such that the inter- 
actions engaging similar users, similar actions and similar 
reactions are dealt with in a similar way. 

Moreover, in conventional choice models, it is assumed 
that in each interaction t, the buyer will take at least one 
item i* £ At- This assumption is, however, not true in our 
case since user's visit to a recommender system does not 
always yields a response. For example, users frequently visit 
online e-commerce website without making any purchase, or 
browse a news portal without clicking on any ad. Actually, 
such nonresponded visits may account for a vast majority 
of the traffics that an recommender system receives. More 
interestingly, different users may have different propensities 
for giving a response. It is important to reflect this in the 
model as well. To this end, we add a scalar latent factor, 
9u , for each user u to capture the response propensity of the 
buyer u. At an interaction t, we assume buyer Ut makes 
an effective purchase only if he feels that the overall quality 
of the offered goods At are good enough. In other words, 
there is a certain reserve utility that needs to be exceeded 
for a user to respond. In keeping with the multinomial logit 
model and the latent factor parametrization, we have the 
following model 



Pu{R = i\A) = 
Pu{R^<d\A) = 



exp(6lu) -I- J2jeA exp(<?!)JV'i) 

exp(6',^) 

exp(6i„) -I- J2jeA exp(0IV'i) 



,\fieA; (4) 
otherwise (5) 



which we refer to as multinomial logit factor or MLF model. 
Note that this new formulation reduces the parametric com- 
plexity of Pu{R\A) significantly to linear scale, i.e., 0(A;(A'"-|- 
M)+N) 0{N + M), where k is the dimensionality of the 
latent factor (f> €'MJ' and ip GM'^, which is generally a small 
number (usually up to a few hundreds). 

4.1.4 Position Bias 

An important factor that was overlooked by the MLF 
model yet is important in practice is the position bias. In 
particular, the choice of a buyer depends not only on the 
utilities of the available alternatives but also on how they 
are placed (i.e., the positions), e.g., users usually pay atten- 
tions only to a few top-ranked goods and totally disregard 
the others. Such position bias is evident in many online de- 
cision making scenarios, e.g., Web search, recommendation, 
advertising. We extend the MLF model by adding a set of 
position-specific latent factors {/3p G R'°,P = 1, ■ ■ ■ ,1} via: 



PuiR = i\A) 



exp{{(t}u,iii,Pp(i))) 



exp(0„) -I- exp({</>„, -i/jj , /3p(j))) ' 



(6) 



where p{i) denotes the position of item i, {ip, tp, 13) = 1^ (0 o 
tp o 13) = X^iLi '^W'/'M/^W is a three-way inner product, o 
denotes Hadamard (aka element-wise) product. 

4.1.5 Conditional Maximum Likelihood Estimation 

Given a collection of training interactions {{ut , At , Rt)} , 
the latent factors, (j> and ^, can be estimated using penalized 
conditional maximum likelihood estimation via 

e% '■ EOog[e'"* + J2 - (1 - S,,t){^ut^P,'h,,) 

t jeAt 

I 

- s.^teuA + Am ^ WcPulf + Ai^ iiv.ll' + Ap 

ueu iei p=i 

where ^0 t = 1 if _Rt = 0, or otherwise. 

4.1.6 Distributed Stochastic Optimization 

Due to the use of bilinear multiplications, although the 
conditional likelihood is convex w.r.t. r„i as each of the 
objective terms is strongly concave, it is nontheless non- 
convex w.r.t. the latent factors (p and tp. Moreover since 
the interactions evolve over time, it is desirable to have al- 
gorithms that are sufficiently efficient and preferably capa- 
ble to update dynamically so as to reflect upcoming data 
streams, therefore excluding offline learning algorithms such 
as classical SVD-based factorization algorithms [TS] or spec- 
tral eigenvalue decomposition methods [T7|- Here, we use a 
distributed stochastic gradient variant based on the Hadoop 
MapReduce framework. The infrastructure is analogous to 
what was proposed in 33,. The basic module is a stochastic 
gradient descent algorithm, which loops over all the observa- 
tions and updates the parameters by moving in the direction 
deflned by negative gradient. For example, for a given re- 
sponded session (it, A,i*), we can carry out the following to 
update the latent factors on each machine separately: 



For u do: 



l'{u, i) X tpio (3p^ + Xu(j>v 



jeA 



For each i £ A do: 



A simplex solution for the above is given simply by: 



V'i <- V'i - »7 j) X O /3p, + Xxl(ji] . 

• For each p £ {1, ■ ■ ■ ,1} do: 

l3p -(^ l3p - rj [l'{u, ip) X i/iip o 0„ + XpPp] . 

where rj is the learning ratfl The gradient is given by: 
exp((<;/>„?/),/3pJ) 



l' {u, i) 



exp(6l„) + exp{{(j>utpjPp^)) 



6^.i' ■ (7) 



4.2 Strategic System Action Optimization 

The distribution pu{R\A) characterizes (in probability) 
how a buyer would react to a given action. Knowing this 
enables us to optimize the seller's action strategy (i.e., rec- 
ommendation policy) by maximizing its utility (payoff) Us 
[16j . In this section, we show that this can be formulated 
based on von Neumann-Morgenstern's expected utility the- 
ory. We then specify the formulation in terms of three exam- 
ple payoff objectives, i.e., click-through rate, sales revenue 
and consumption diversity. 

4.2.1 Expected Utility Maximization 

Because of the uncertainty/risk inherent in the game, it is 
nature to formulate action optimization as decision making 
under uncertainty. Consider a given game Gu between the 
seller S and a specific buyer u, the action space is the set 
of all possible combinations of I goods, A = ■ An action 
A £ A yields an outcome Z = {A,R) £ Z = A x TL with 
probability distribution p{Z) {aka lottery), where the reac- 
tion space TZ = Au {0}. Because our knowledge about the 
environment is imperfect, we would rather adopt a proba- 
bilistic action strategy such that actions for Gu are sampled 
according to a distribution Pu{A) (defined over the action 
space A , any A £ A is taken with probability pu{A)), then 
we have Pu{Z) = Pu{A)pu{R\A) , where we specify the de- 
pendence on the user with a subscript to emphasize the fact 
that the action is customized for each user. 

A utility function Us{Z) is a mapping Us '■ Z ^ M, which 
defines a preference relation )p over the outcome space Z 
such that Z )p Z' a and only if Us{Z) ^ Us{Z'). Without 
loss of generality, we assume )p is von Neumann-Morgenstern 
rational, i.e., it satisfies the four axioms: completeness, tran- 
sitivity, independence and continuity. The von Neumann- 
Morgenstern (vNM) theorem defines the best outcome of a 
decision in an environment under uncertainty as follows pS]. 

Theorem 2 [Expected Utility]: Suppose ^ is a pref- 
erence defined by an utility function Us that satisfies the 4 
axioms, for any two distributions (lotteries) p{Z) and q{Z), 
we have: p)p q if and only if¥,p{Us) ^ Eg(t/s). 
Proof c.f. gg. □ 

Based on the vNM theorem, the optimal action strategy 
Pu{A), given pu{R\A), can be achieved by the following lin- 
ear optimization: 

max y p4A) y pu{R\A)Us{A,R) (8) 
s.t. : puiA) = 1, and p^^A) ^ 0. 

ASA 



Pu{A) = 5a,a; where Al = argmax ^ pu{R\A)Us{A, R). 

Ren 

In practice, it is usually favorable, (e.g., for risk-robustness 
reasons) to choose a less sparse distribution (i.e., a portfolio 
[20| 1 rather than the singular distribution as defined by a 
simplex solution, the discussion of which is, however, beyond 
the scope of this work. 

4.2.2 Action Strategy Parametrization 

Although the simplex solution looks simple, exhaustive 
search throughout the outcome space is still something prac- 
tically prohibitive as there are 0{N M^^^) extreme points. 
To this end, we propose to parameterize the action distri- 
bution in terms of a small set of parameters O, e.g., to as- 
sume action A is sampled from a parametric distribution 
Pu{A;Q). In this way, we can search A efficiently by opti- 
mizing Q instead. As a preliminary study, here we devise a 
simple parametrization by randomizing a utility-based rank- 
ing scheme with a scalar parameter a. Particularly, for any 
given user u, assume the top-ranked I items (i.e., items with 
highest payoffs) are denoted {ij' , . . . , i^}, we generate the 
action A as follows: 



• yl = 0. 

• For j from 1 to I do: 

- With probability (1 — a) add i* to A 

- With probability a add an random item to A 



This way, action optimization in Eq® become a one-dimensional 
optimization, to which the solution can be obtained effi- 
ciently, e.g., via golden-section search. 

A more fiexible parametrization is to factorize p{A) se- 
quentially p(yl) = p{ii)p{i2\i\) . . .p{ii\ii,i2, . . . ,ii-i), with 
a few simplifications, we can search the action space by dy- 
namic programming. We leave this for future research. 

4.2.3 Strategic Payoff Specification 

So far, our discussion of action optimization is in terms of 
an abstract payoff function Us- We now specify our formu- 
lation with three concrete strategic objectives. 

Payoff #1: Click-Through Rate (CTR). Click-through 
rate or CTR is the ratio of responses (i.e., Rt ^ 0) out of 
all the interactions. CTR is the most important measure of 
success for many real-world recommender systems because it 
crucially determines so many important factors ranging from 
traffic, revenue to user base. For example, it corresponds to 
the advertisement click rate in Google, the movie rental rate 
in Netfiix, the order placement rate in Amazon, and the rate 
of friend connection in Facebook Friend-Finder. CTR can 
be formulated in the CCF framework as follows: 

CT7? = E„[EaK(7?/0|A)]] (9) 

where /„ is a measure of user loyalty (e.g., user u's visit 

cxp(e,t) 



We carry out an annealing procedure to discount rj b y a 
constant factor after each iteration, as suggested by [14| . 



frequency), p^R ^ <li\A) = 1 ~ exp(9„)+E.g^ cxp(0^ ' 

Payoff #2: Sales Revenue (SR). Another important 
measure of success is sales revenue or SR, which is the rev- 
enue that a recommender system receives from the transac- 
tions (interactions) with the users. SR is a weighted version 



of CTR, i.e., each click is assigned a weight of importance. 
Based on CCF, SR can be formulated via: 

SR = E„[Ea[E,6a[c,p„(7? = i\A)]]] (10) 

where d denotes the price (weight) of an item i. 

Payoff #3: Consumptions Diversity (CD). It is widely 
believed that recommender systems are the key contributor 
that turns the industry from what used to be a highly con- 
centrated "blockbuster" Q towards a highly diversified long- 
tail (niche) market [5] [30] . Recent research shows that this 
is, however, not entirely true — a recommender system, if 
designed improperly, could reinforce consumption concen- 
trations [9]. In order not to turn our society to a echo 
chamber, it is important to encourage consumption diver- 
sity (CD), i.e., to ensure the consumptions of the whole 
population are not narrowly concentrated. Moreover, CD 
is also important to online firms to help them gain profit 
from long-tail market. CD can be formulated based on the 
CCF framework in terms of expected choice entropy: 

cd = e4EaIHu{r\a)]] (11) 

= -Y.f-Y. Pu{A)Y,p^{R = i\A) logp4R = i\A) 
usw AeA ieA 

where Hn{R\A) = EigAP"(-R = i\A)logpu{R = i\A) is 
the entropy of user u's choice in the context of A. Note 
that consumption diversity is an aggregate measure (i.e., 
the diversity of the consumptions of the whole population), 
which is different from the traditional individual diversity 
(i.e., the dissimilarity of items recommended to an individual 
user) . 

4.3 Implications of CCF and Future Work 

We finally remark that there are some interesting prop- 
erties of the proposed CCF model. Firstly, since the games 
in user-system interactions are finite, there exists an equilib- 
rium point (i.e., a stable strategy). As a matter of fact, since 
that the reaction to a given action is rational and that the 
action given pu{R\A) is vNM-rational, it can be shown that 
the CCF model approximates the perfect Nash eqmUbrium 
[16j . From a practical point of view, it is, however, possi- 
ble to optimize the recommender systems more aggressively 
beyond the market equilibrium. Particularly, the analogy 
of recommender system to a monopoly market provides a 
number of important perspectives , e.g., the reflection of 
price discrimination in recommender system — how recom- 
mender system can exploit its market power to transfer the 
consumer surplus 5 . Another interesting topic is to explore 
the correlation and conflict of goods, and optimize action A 
as a bundle based on portfolio theory [201 We would 
rather leave these interesting discussions for future research. 

5. EXPERIMENTS 

We test the proposed CCF framework on a real-world 
commercial recommender system. Because CCF is com- 
prised of two components, it is necessary to test each of 
them separately — otherwise, it would be difficult to tell if a 

*The well-known 80-20 rule or the Pareto principle states 
that, of the many goods available, consumptions are con- 
centrated on a small subset of bestselling ones. 



change of performance is due to one component or the other 
or both. Our experiments therefore consist of two test-beds. 
Firstly, we compare the proposed conditional reaction model 
(i.e., the Multinomial logit factor model) in our CCF frame- 
work (referred to as CCF II) with the plain CCF preference 
model proposed in our prior work 32 (referred to as CCF 
2) as well as state-of-the art CF baselines in terms of their 
abilities in preference estimation; to maintain a fair compari- 
son, recommendations are done without action optimization 
for CCF II, i.e., via simple utility-based ranking. This com- 
parison gives us an idea on how effective our MLF model 
for pu{R\A) is compared with state-of-the art preference 
models. Furthermore, we compare the CCF framework (i.e., 
MLF + Action optimization) and the conventional recom- 
mendation scheme (i.e., CF + utility-based ranking). This 
comparison further demonstrates how the game-theoretic 
formulation, particularly how action optimization, further 
enhance the recommendation performance. 

5.1 Data 

We collected a large-scale set of user-system interaction 
traces from a commercial News article recommender sys- 
tem. In each interaction, the system offers four personalized 
articles to the visiting user, and the user chooses one of them 
by clicking to read that article. The recommendations are 
dynamically changing over time even during the user's visit. 
The system regularly logs every click event of every user 
visit. It also records the articles being presented to users 
at a series of discrete time points. To obtain the action set 
for each user-system interaction, we therefore trace back to 
the closest recording time point right before the user-click, 
and we use the articles presented at that time point as the 
action set for the current session. We collected such interac- 
tion traces from logged records of over one month. We use a 
random subset containing 3.6 million users, 2500 items and 
over 110 million interaction traces. Learning an effective 
recommender on this data set is particularly challenging as 
the article pool is dynamically refreshing, and each article 
only has a lifetime of several hours — it only appears once 
within a particular day, is then pulled out from the pool 
afterward and never appears again. 

5.2 CCF Without Action Optimization 

We first evaluate CCF without action optimization (CCF 
II) with comparison to the plain CCF preference model (i.e., 
CCF I) and the two CF models described in Section[2] where 
recommendations are made by utility-based ranking. We 
consider the following two evaluation settings, one offline 
and the other online. 

Offline evaluation We evaluate the learned recommender 
models in terms of the top-fc ranking performance on 
a hold-out test subset. We use three standard infor- 
mation retrieval measures as evaluation metrics, i.e. 
average- precision at position n (APOn), average-recall 
at n (AR@n) and normalized-discounted-cumulative- 
gain at n (nDCG@n), where n — 4, the default recom- 
mendation size used in the news recommender system. 

Online evaluation We further conduct an online test. In 
particular, for each incoming interaction, we use the 

^Note that the essential differences between CCF I and II 
are merely: (1) CCF II models null reactions and response 
propensities; (2) CCF II models position bias. 



Table 2: Offline test: comparison of top-fc ranking 
performance. 



Model 


AP@4 


AR@4 


nDCG@4 


30% Training 


CF-£2 


0.245 


0.261 


0.255 


CF-Logistic 


0.246 


0.263 


0.257 


CCF I 


0.262 


0.278 


0.274 


CCF II 


0.267 


0.279 


0.278 


50% Training 


CF-^2 


0.250 


0.273 


0.268 


CF-Logistic 


0.252 


0.276 


0.269 


CCF I 


0.266 


0.285 


0.278 


CCF II 


0.269 


0.284 


0.281 


70% Training 


CF-^2 


0.253 


0.275 


0.271 


CF-Logistic 


0.253 


0.276 


0.274 


CCF I 


0.267 


0.287 


0.280 


CCF II 


0.271 


0.284 


0.282 



Table 3: Online test: comparison of conditional re- 
action prediction accuracy. 



Model 


30%train 


50%train 


70%train 


Random 


0.250 


CF-^2 


0.337 


0.343 


0.347 


CF-Logistic 


0.341 


0.345 


0.347 


CCF I 


0.377 


0.385 


0.391 


CCF II 


0.383 


0.387 


0.392 



trained models to predict user choice reaction, i.e., 
which item among the four recommended ones will be 
taken by the user. This prediction directly assesses the 
accuracy of the MLF model in user reaction modeling. 

Offline test results. In this setting, we train each model 
on progressive proportions of 30%, 50% and 70% randomly- 
sampled training data respectively, and evaluate each trained 
model in terms of offline top-fc ranking performance. The re- 
sults are reported in Table[2] Since the data set is fairly large 
the standard deviations of all values are considerably below 
0.001. Consequently we omitted the latter from the results. 
As can be seen from the table, the CCF II (i.e., MLF) model 
dramatically outperform the two CF baselines in all of the 
three evaluation metrics. Specifically, CCF II gains up to 
9.0% improvement over the two CF models in terms of aver- 
age precision; up to 6.9% in terms of average recall and up 
to 8.9% in terms of nDCG. Moreover, by modeling position 
bias and response propensity, CCF II also outperforms CCF 
I in most (7 out of 9) of the comparisons. Note that even 
compared to CCF I, the improvements achieved by CCF II 
are also significant (e.g., for the system we worked on, any 
improvement of the dashboard metrics especially nDCG or 
CTR greater than 0.1% is a significant breakthrough). Also 
note that the offiine results obtained by CCF are quite sat- 
isfactory. For example, the average precision is up to 0.271, 
which means, out of the four recommended items, on av- 
erage 1.1 are truly "relevant" (i.e. actually being clicked by 
the user). This performance is quite promising especially 
considering that most of the articles in the content pool are 
transient and subject to dynamically updating. 

Online test results. We further evaluate the online perfor- 



0.35 
0.3 




AP@4 AR@4 nDCG(s)4 



Figure 1: Top: Position bias in user choice reac- 
tion; Bottom: Comparison of recommendation per- 
formance before and after modeling position bias. 



mance of each compared model by assessing their predictions 
of user reaction. In particular, for each of the incoming re- 
sponded visits {ut, At,it), we ask the question: "among all 
the recommended items i £ At, which one will most likely 
be clicked?" We use the trained model to rank the items in 
A, and compare the top-ranked item with the actual choice 
of the user (i.e. i*). We evaluate the results in terms of 
the prediction accuracy. The results are given in Table [3] 
Because the size of each offer set in the current data set is 
4, a random predictor yields 0.25. As can be seen from the 
table, while both the two CF models and the two CCF mod- 
els obtain significantly better predictions than the random 
predictor, the two CCF models further dramatically outper- 
form the two CF baselines, with CCF II performs consis- 
tently the best. In particular, CCF II improves the reaction 
prediction accuracy: compared with the least square CF by 
13.7%, with the logistic CF by 12.7% and with CCF I by 
1.6%. According to a f-test with significance level 0.01, all 
the improvements are statistically significant. 

Impact of position bias. We observe significant position 
bias in the News recommender system. As shown in Fig- 
ure[TJtop), the left-most and right-most positions (i.e., posi- 
tion 1 and 4) receive significantly higher click rate than the 
two middle ones (i.e., position 2 and 3). In the bottom fig- 
ure, we show the recommendation performance of the CCF 
(i.e., MLF) model before and after incorporating bias fac- 
tors (i.e., P in Eq®). We can see from this figure that the 
performance improvements from CCF I to CCF II can be 
attributed mostly to the position bias factor. Further exper- 
iments confirm that the propensity factor only contributes 
a marginal improvement in nDCG. 

Impact of parameters. The performance of the MLF 
model is affected by the parameter settings of the latent di- 
mensionality, k, as well as the regularization weights, Xx and 
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Figure 2: Offline top-A; ranking performance 
(nDCG®4) as a function of latent dimensionality k 
(top) and reguiarization weight A (bottom). 

Xu- In Figure [Jl, we illustrate how the offline top-fc rank- 
ing performance changes as a function of these parameters, 
where we use the same value for both Ai and Xu- Here we 
only reported the results with nDCG@4 measure because 
the results show similar tendency when other measures (in- 
cluding the reaction accuracy) are used. As can be seen from 
the Figure, the nDCG curves are typically in the inverted 
U-shape with the optimal values achieved at the middle. In 
particular, for the MLF model, the dimensionality around 
50-100 and reguiarization weight around 0.0001 yield the 
best performance, which is also the default parameter set- 
ting we used in obtaining the results reported in the current 
paper. 

5.3 CCF With Action Optimization 

We now move on to evaluate the entire CCF framework 
(i.e., MLF -|- action optimization) in terms of its ability to 
achieve the three strategic goals. 

Evaluation metrics. We test a recommendation model by 
applying it on top of the algorithm in production and com- 
paring the results with the production baseline. To assess 
performance, we report the relative surplus. In particular, 
let m denote one of the three measures (i.e., click-through 
rate, sales revenue and consumption diversity), a relative 
surplus score is defined by: 

, . , m(model) — m(production) 

relative surplus — — ^ ^ ^ - 

m(production) 

Evaluation protocol. To illustrate how effective action 
optimization could be, we compare CCF with action op- 
timization (CCF-f AO), to CCF without action optimiza- 
tion (CCF-AO) as well as the conventional recommenda- 
tion scheme (collaborative filtering with utility-based ranking 
or CF-I-RK). For each model, we simulate its relative surplus 

®Due to heavy computational consumptions, these results 
are obtained on a relatively small subset of data. 
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Figure 3: Performance in achieving strategic objec- 
tives: relative surplus compared to the production 
baseline in terms of CTR, SR and CD. 

score by applying the model to the production output. In 
particular, we take the top 50K users who visit our website 
most frequently as test probes and trace them for one month. 
For each of these user u and each of the dates d, we maintain 
a positive set Pu,d and a negative set N^^d by including all 
the articles that user u reads on date d into Pu,d and any 
other items in the content pool of date d into Nu.d- We as- 
sume user u turns to take items only from Pu,d and ignores 
those in Nu,d on date d. Specifically, the reaction of user u 
on date d to any action A is assumed as follows: for any item 
i & A, if An Pu,d 7^ and i £ Ad Pu,d, u takes i with prob- 
ability l/l^n JV„_£j| or otherwise ignores it; a nonresponded 
session occurs when Pu,d ~ 0. To compute sales revenue, we 
randomly assign to each item a positive number as "price", 
which is predefined and never changed throughout the eval- 
uation. Moreover, maximizing consumption diversity alone 
leads to meaningless random recommendations; to this end, 
we impose a hard constraint to ensure that the decreases in 
CTR is no more than 0.5%. 

Results and analysis. The aggregate results on the 50K 
probe users are depicted in Figure [S] Applying a traditional 
recommendation scheme (CF -I- preference based ranking) 
on top of the production baseline only yields marginal im- 
provements in CTR and SR. In contrast, CCF gains up to 
4.5% and 3.9% surplus in CTR and SR respectively; and 
action optimization further significantly enhance these num- 
bers. Interestingly, in terms of consumption diversity, our 
experiment confirms the findings of 9 . For example, apply- 
ing CF and CCF-AO directly without consideration of CD 
inevitably leads to consumption concentration, as shown by 
the negative surplus scores in Figure (3] In contrast, CCF 
4- AO is the only one among the three models that yields 
positive surplus in CD. In particular, with less than 0.5% 
reduction of CTR, it gains up to 3.2% improvement of di- 
versity. These observations are somewhat surprising consid- 
ering that the preliminary action parametrization we used 
in the experiment is a bit overly-simplistic — it merely con- 
tains one single parameter a for simple randomization (c.f. 
Section r4.2.2ll . In future work, we plan to explore more flex- 
ible forms of action parametrization such as the sequential 
factorization model mentioned in 14.2.21 we expect to have 
even more promising results. 

6. SUMMARY 

We presented a novel game-theoretic framework for rec- 
ommendation by viewing the user-system interactions at rec- 
ommender system as buyer-seller interactions in a monopoly 



economic market. Since tlie decisions of the user and the 
buyer arc interdependent, this new perspective motivates us 
to optimize the action strategy of the system by first pre- 
dicting users' reaction and then adapting its action to max- 
imize the expected payoff. The extended CCF framework 
consists two essential components: (1) a model for pu{R\A) 
that integrates choice models in econometrics and latent fac- 
tor model in collaborative filtering to encode the notion of 
collaborative games; and (2) a formulation for optimizing 
system action A in terms of expected strategic payoffs such 
as click-through rate, sales revenue and consumption diver- 
sity. Experiments on a real- world commercial recommender 
system have demonstrated the effectiveness and appealing 
promise of the proposed framework. 
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